Polibits, Vol. 53, pp. 43-48, 2016.
Abstract: We propose a segment-based weighting technique for genre classification of web pages. This technique exploits character n-grams extracted from the URL of the web page rather than its textual content. The main idea of our technique is to segment the URL and assigns a weight for each segment. Experiments conducted on three known genre datasets show that our method achieves encouraging results.
Keywords: URL, genre classification, web page, segment weight
PDF: A Segment-based Weighting Technique for URL-based Genre Classification of Web Pages
PDF: A Segment-based Weighting Technique for URL-based Genre Classification of Web Pages
http://dx.doi.org/10.17562/PB-53-4
Table of contents of Polibits 53